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Based on multiple short molecular dynamics simulation trajectories that start from dispersiveiy 
selected initial conformations, we have designed the weighted ensemble dynamics (WED) method 
[Phys Rev. E 80, 026707 (2009)] to robustly and systematically explore complex conformational 
spaces through the spectral analysis of covariance matrix of trajectory-mapped vectors. In WED, 
the non-degenerate ground state of the covariance matrix directly predicts the ergodicity of simula- 
tion data, and the ground state can be adopted as the statistical weights of trajectories to correctly 
reconstruct equilibrium properties, even though each trajectory only explores part of the confor- 
mational space. Otherwise, the degree of degeneracy simply provides the number of the kinetically 
isolated meta-stable states of the system under the time scale of the individual trajectory. Ma- 
nipulations on the eigenvectors of the matrix with the first a few small eigenvalues leads to the 
classification of trajectories into non-transition ones within the metastable states and transition 
ones between these states. The transition state ensemble may also be predicted only based on these 
eigenvectors without a priori knowledge of the system. Here, we have further improved the WED 
method. Several schemes have been designed respectively for depressing the statistical error, system- 
atically selecting the physical variables in analysis, and determining the only adjustable parameter 
of WED. Besides, we illustrate the application of WED in more complicated examples, including a 
system with a transition network among multiple states, a system with entropy-dominated states, 
and a solvated alanine dipeptide system at low temperature. WED is found to correctly capture the 
topology of the transition network. It can also identify the solvent effect for the solvated alanine 
dipeptide system, as well as the diffusion-like dynamical behavior both in the entropy-dominated 
system and in the low-temperature solvated alanine dipeptide. These examples will serve as the 
references for the application of WED to more complicated biological macromolecules. 

PACS numbers: 02.70.Ns; 87.15.A-; 82.20.Wt 



I. INTRODUCTION 



Exploring the high-dimensional conformational space 
of molecular systems has long been the focus of com- 
puter simulation studies. Two challenges exist under 
this topic, i.e., how to thoroughly sample the confor- 
mational space, and how to understand the equilibrium 
(and dynamic) properties based on lots of simulation 
data in hand. Although standard Monte-Carlo (MC) 
and molecular-dynamics (MD) simulation methods have 
been widely applied to sample the equilibrium conforma- 
tional distribution, for many complex systems, such as 
glassy systems, phase-transition systems, and solvated 
biopolymers, sufficient sampling is by no means an easy 
task. With increased degrees of freedom and finite (low) 
temperature, the potential energy landscapes of these 
systems are characterized by exponentially growing lo- 
cal energy minimums that are separated by large energy 
barriers (compared to thermal energy fcgT, where fcs is 
the Boltzmann's constant and T is the temperature). A 
rough energy landscape frequently traps the simulation 
trajectories in a small area of the conformational space, 
disapproving the thorough exploration of the whole space 
within limited simulation time. Once this happens, the 
ergodicity of the simulation data is effectively broken, 
e.g., the time average of some physical quantities can 
not converge to their average values within the equilib- 



rium ensemble. To overcome these difficulties, beyond 
the standard MD and MC sampling algorithms, vari- 
ous advanced techniques 
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been developed to more efficiently sample systems with 
rugged energy (or free-energy) surface. Reviews of the 
advanced simulation techniques can be found in refer- 
ence [H, [25|, [H, [I?}. These methods can prominently 
accelerate the exploring in conformational spaces, while 
preserving the ability to reproduce the correct equilib- 
rium distributions. Despite the success already achieved, 
it is still difficult to thoroughly investigate a practically 
interesting system even based on these advanced tech- 
niques. Based on multiple parallel generated tra ject ories, 
the ensemble dynamics (ED) method [H, l29l l30j pro- 
vides an alternative way for abridging the gap between 
simulation power and practical interest. Benefiting from 
the growing power of massively distributed computation, 
the ED method can reduce the wall-clock time required 
for data production. Furthermore, if the ED trajectories 
were spawned from dispersiveiy selected conformations, 
then compared to single-trajectory simulations, larger 
area in conformational space might be covered by the 
ED simulations within the same CPU time. Besides, the 
multiple simulation trajectories that start from different 
conformational regions enable the ergodicity test of the 
simulation data. For example, the measure of ergodicity 
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could be denned as [U [H[ , 

d{t) = ^l\{Rt{t)-Rf{t))\ (1) 

where, (i = 1, . . . , Nr) denotes a set of physical 

quantities that are able to differentiate the possibly dis- 
connected conformational regions. Rf{t) and Rf(t) are 
the time average of R4 over the beginning t time length 
of the trajectories that respectively start from conforma- 
tion A and B. For ergodicity-broken simulations, d(t) 
will converge to a non-zero value without prominent de- 
creasing in a limited simulation time, while for an ergodic 
and converged simulation, this function will relax to zero. 

The proper analysis and understanding of the sim- 
ulation data may be as important as the data sam- 
pling itself, particularly for complex systems with lots 
of degrees of freedom. Except for testing the ergodic- 
ity of simulation data, the major purpose in data an- 
alyzing is to search for a coarse-grained description of 
the conformational space. In the traditional way, a few 
(usually only one or two) knowledge-based reaction co- 
ordinates (or order parameters) are first defined, then 
the equilibrium (and dynamic) properties of the system 
are achieved by mapping sampled conformations in the 
low-dimensional space spanned by the presumed reac- 
tion coordinates. However, this method may suffer from 
the arbitrariness in the selection of the reaction coor- 
dinates, and the oversimplified description of the sys- 
tem |33f j - Recently, a more systematic, conformation- 
network-based view of dynamics brings prominent im- 
provement [3J, l35|, l3y, l37|, l38|, [39|, |40|, |4l| . In the kinetic 
network models, the conformational space is divided into 
some meta-stable regions. If the transition events be- 
tween these regions were approximated with first-order 
reactions, the whole system could be coarse-grained into 
a Markov transition network [H, 0, Hl[. More works 
about Markov network can be found in the current re- 
view (33j . Being theoretically transparent and inclusive 
of nearly all the useful information, the network model 
has received much attention. 

The three aspects of sufficient sampling, ergodicity 
testing, and data mining are quite relevant to each other. 
To establish such a network, it is necessary to thoroughly 
sample the conformational space and to suitably analyze 
the simulation data. The understanding of metastable 
states is definitely helpful for guarding more efficiently 
sampling. In the previous work [42| . we have proposed 
a general method, weighted ensemble dynamics (WED), 
as an integrated view of these challenges. WED is based 
on multiple short MD (or MC) simulation trajectories, 
which are started from dispersively-chosen initial confor- 
mations, and generated under the same simulation con- 
dition (such as the same Hamiltonian, temperature, pres- 
sure, in one word, the same equations of motion). Some 
of these trajectories may reach local equilibrium within 
metastable states, the other may happen transition be- 
tween these states. Although each single trajectory is 
too short to reach the global equilibrium, the trajectories 



as a whole may have covered whole the conformational 
space, and should be able to be combined to reproduce 
the equilibrium distribution of the system. The combined 
analysis of parallel trajectories already exists, such as the 
weighted histogram analysis method (WHAM) [43J. In 
WHAM, the energetic density of state (DOS) of the simu- 
lation system is first estimated by analyzing several tra- 
jectories simulated at different temperatures, then the 
physical properties are derived out with the DOS. For 
trajectories simulated under the same condition, WED 
provides a simpler way for combining trajectories, i.e., 
specifying each trajectory a statistical weight. Besides, 
WED is capable to establish the network of metastable 
states, as well as to test the ergodicity of simulation data. 

In current paper, we will further improve and validate 
the WED method by: (i) discussing the approximations 
in WED and providing a practical scheme to select the 
unique adjustable parameter in WED; (ii) providing an 
improvement of WED to depress the statistical errors; 
(iii) illustrating the application of WED to some more 
extremal examples, such as, a system with a transition 
network among multiple states, a system with an entropic 
state (i.e, a state with large entropy and long intrastate 
equilibration time), and the solvated alanine dipeptide 
simulated both at room temperature and at low temper- 
ature. We find that WED can correctly detect the topol- 
ogy of the transition network, predict the diffusion-like 
dynamical property for both the system with an entropic 
state and the dipeptide simulated at low temperature. 
More importantly, we apply WED to separately analyze 
effects of different aspects (degrees of freedom, interac- 
tions etc.) in metastable states and dynamics of complex 
systems, such as the solvent effects in the solvated dipep- 
tide. These examples will serve as the references for the 
application of WED to more complicated systems. 

The article is organized as follows. In sect. UH we re- 
derive the central results of WED which was briefly pre- 
sented in our previous paper [4^]. In sect. IIIIl we discuss 
approximations of WED, such as, the completeness of 
basis functions, the effects of the unique free parame- 
ter, and present some tactics for facilitating applications 
of WED to improve statistical errors. In sect. IIV1 we 
discuss the application of WED in solved dipeptide at 
room temperature and low temperature. A short con- 
clusion is given in sect. |Vj Finally, different trajectory 
re- weighting methods, and application of WED in a sys- 
tem with multiple-state transition network and a system 
with entropic-dominated states are presented in the Ap- 
pendix. 



II. THEORY OF WED 

Most of the simulation algorithms can fit into the 
framework of probability evolution, 

P(q,t) = ( G(q,t;q',0)P(q',0)dq'. (2) 
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Here P(q, t) is the probability density function of the 
system at time i, with q (and q') denoting the gener- 
alized conformational coordinates. G(q, t;q' , 0) is the 
propagator of the system under the applied simula- 
tion algorithm. The initial probability density func- 
tion, Pinit{q) = P(q,0), is propagated to P(q,t) by 
G{q, t; q' , 0), and will reach the stationary distribution, 
Pstat(q)=P{q, oo), with long enough t. 

For instance, Pmit(q) can be written as S(q — qo) for a 
single-trajectory MD simulation. Here S(q— qo) denotes a 
Dirac 5-function, and qo is the initial conformation of the 
simulation. If the trajectory is simulated in the canonical 
ensemble with temperature T, the stationary distribution 
turns out to be the canonical equilibrium distribution 
P eq {q) oc exp[—(3V(q)]. Here f3 = jt~t' an d V(q) is the 
potential energy of the system. In WED, multiple MD 
(or MC) trajectories are independently generated from 
different initial conformations, q%o,i — ,Nt, and 

with the same simulation condition, the initial distribu- 
tion should be expressed as Pi n it{q) = jfc Ei=a ^(Q—Qio)- 
In the following, we will focus on reconstructing the 
canonical equilibrium distribution P eq {q) with multiple 
MD trajectories that are all with the t— time length and 
are generated with the same equations of motion but ar- 
bitrarily chosen different initial conformations. 



A. Central results of WED 

Except for the possible effect of insufficient sampling, 
the difference between the conformational distribution of 
all the multiple short trajectories and the equilibrium 
distribution P eq {q) solely comes from the particular se- 
lection of the initial conformations. Since every trajec- 
tory is generated by the same simulation algorithm that 
keeps the desired equilibrium distribution, different con- 
formations in a trajectory should contribute equally to 
the final stationary property. Thus, we may specify each 
trajectory i a statistical weight Wi to mimic the equilib- 
rium samples. Then, the average of any physical quantity 
A(q) over the equilibrium distribution can be written as 



(3) 



Here (A(q))i and (A(q)) eq denote the average of A(q) over 
the ith trajectory and the equilibrium ensemble, respec- 
tively. Summation is over all the trajectories. In prac- 
tice, it is reasonable to discard the initial short (local 
equilibrating) segment of each trajectory in calculating 
averages in trajectories. With this scheme, it is possible 
to estimate equilibrium properties of the system without 
demanding the equilibration of each trajectory. When 
distributed computation facility is available, we can in- 
dependently generate lots of short trajectories in different 
computer at the same time, thus the wall-clock time re- 
quired for converged results greatly decrease. Of course, 
the examination of the convergence and ergodicity of sim- 



ulation data, and a proper estimation of {w^ are crucial 
in application. 

While the length of simulation trajectories, r, is not 
very short, the conformational space could be divided 
into a few meta-stable regions, wherein a single r— length 
trajectory could reach local (inner-region) equilibrium. 
In other words, a r— length trajectory may visits one or 
more conformational regions, but it always reach local 
equilibrium in each of its visited regions, the inter-region 
equilibrium may not be reached due to the not long r. 
Thus, for trajectories that started from the same meta- 
stable region, the specified weights - although dependent 
on the initial conformations - should be approximately 
identical when reproducing the equilibrium properties. 
Therefore, we could write 



Wi oc 



J a(l) Pinit(q)dq 



(4) 



where a(i) denotes the meta-stable region, in which the 
ith trajectory is started. Instead of explicitly using 
Eq. (J4|), we simply construct a new ensemble of con- 
formations —X— in practice. X is constituted by the 
conformations in the initial t— length of all the simu- 
lation trajectories. Within short enough t (compared 
to r), each trajectory is supposed to be still explor- 
ing the same meta-stable conformational region which 
it started from, leading to the quite plausible conclusion 
that f a Px{q)dq ps f Pi n it{q)dq for all the meta-stable 
regions. Here, a denotes the meta-stable region in consid- 
eration; Px (q) denotes the probability density function of 
the conformations in X. Consequently, Px(q), instead of 
Pinit{q)i could be used to estimate the trajectory weights 
as follows, 



Wi = (Q X ,eq(q))i+- 



(5) 



Here £lx,eq{q) = , (■ ' ' )i+ denotes the average over 



the initial t— length segment of the ith trajectory, i.e. 



(A(q)) 



1 



A{q t {t'))dt\ 



(6) 



for any conformational function A(q). Here qi(t') de- 
notes the conformation at t' time in the ith trajectory. 
Heuristically, by averaging Qx,eq{q) over the initial short 
segment of each trajectory, we are calculating {wj ac- 
cording to the initial regions of the trajectories, rather 
than solely by the initial conformations or explicitly by 
metastable region which the trajectory start from. 

Although the analytical expression of Px{q) is un- 
known, considering the general relation 



(n x ,eq(q)A(q)) x = (A(q)) ( 



(7) 



for any conformational function A(q), we could linearly 
expand ilx, e q(q) with a complete set of conformational 
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functions (also referred to as basis functions in the fol- 
lowing) {A^(q)}, as 

Vx.eM = 1 + ]T a M $xA»(q) (8) 

with the expanded coefficients as 

a fl = Y / 9„u(Px)(S x A l '(^) eq . (9) 

Due to the non-orthogonalization of basis functions 
5 x A»(q) = A*($ - (A»(q)) x , g^(Px), which is the 
inverse of the variance-covariance matrix g^"(Px) = 
(5xA> 1 (q)5xA''(q))x with (• • • )x denoting the average 
over X samples, should be included. In Eq. (JS]), the 
equilibrium ensemble average {8 x A^{q)) eq actually relies 
on the trajectory weights {wi} through Eq. ([3]). Thus, 
substituting Eq. into Eq. ([5]) gives out the following 
self-consistent linear equations of {w^}, 

G i -w = 0, i = l,...,N t . (10) 

Here G l = {G%\, . . . , GW t ) T , which can be thought a vec- 
tor mapped from the ith trajectory, Gij = Tij — <5y + -A-, 
and 

f « = ■^T,9^(Px)(S X A^q)) l+ (S X A^q)) J , (11) 

where i,j = l,-- - ,N t , and J2i w i = ^t- w — 
(wi, . . . ,WN t ) T is the vector of trajectory weights. All 
the parameters in Eq. (fTO)) could be calculated simply 
by averaging basis functions over simulation trajectories. 
Since the current formulation is exempted from any in- 
formation about Pinit{<l), initial conformations could be 
arbitrarily selected from different sources, such as coarse- 
grained simulation, high-temperature simulation, exper- 
imental knowledge, or even theoretical conjecture (pos- 
sibly with short local-relaxation running before using as 
initial conformations of WED), then the conformational 
space could be more efficiently traversed and structured. 

By minimizing the residual term / — \G l ■ w\ 2 , 
Eq. (|10p can be written in a symmetric form, which is 
better for analytical treatment, 

Hw = 0. (12) 

Thus, H = G z G l is the variance-covariance matrix 
of the trajectory-mapped vectors {G 1 }, (i — 1, . . . , N t ), 
with at least one zero-eigenvalue as the ground state of 
H, since Yli G % = 0. If the ground state of H is non- 
degenerate, w is uniquely determined by Eq. (|12p , and 
can be subsequently applied to reconstruct the equilib- 
rium distribution of the system. However, it is possible 
the ground state of H is degenerate, it indicates non- 
unique weights of trajectories for reconstructing equilib- 
rium distribution. If the simulation trajectories sepa- 
rately visit some separated conformational regions, and 



there is not any trajectory connect with different regions 
by transition, the relative weights of the separated re- 
gions will be indeterminable from the trajectories. It is 
indeed the reason the weights of trajectories are non- 
unique. Thus it is easy to know that the degree of de- 
generacy of the ground state equals to the number of the 
kinetically separated states of the system under the cur- 
rent simulation timescale. In practice, we usually align 
the eigenvectors of H in terms of their corresponding 
eigenvalues with an increasing order. The eigenvalues of 
H will grow from zero to 1.0, and the number of the 
eigenvalues that are obviously smaller than 1.0 gives the 
number of kinetically isolated (or partial-isolated) states 
in the system under the timescale of the individual tra- 
jectory. The eigenvalues closed to 1.0 corresponds to the 
statistical fluctuations of these simulation trajectories. 
By projecting the vectors, {&}, (i = 1, ■ ■ • , Nt), to the H 
eigenvectors with zero (or near zero) eigenvalues, the tra- 
jectories are mapped into a low-dimensional space, where 
the non-transition trajectories in different states and the 
transition trajectories between these states can be iden- 
tified easily. 



III. DISCUSSION AND IMPROVEMENT OF 
WED 

A. Completeness of basis functions 

Although the expansion in Eq. ([5]) is exact only while 
the basis set is complete, it is not necessary to include 
too many basis functions in WED. This is because only 
the mean values of ^lx.eq(o) over a large number of con- 
formations rather than whole the high-dimensional func- 
tion n x ^ eq (q) itself are required in Eq. (|12[) . In practice, 
only physically relevant and important quantities of the 
system need to be selected as basis functions so that the 
different meta-stable conformational states are effectively 
distinguished. The selection does not demand much fore- 
knowledge. For biological macromolecules, the important 
inner coordinates, such as dihedral angles and some in- 
teresting atomic pair distances, could be chosen as ba- 
sis functions to characterize the conformational motion. 
Various physical quantities, such as the potential energy 
of the system, solute-solvent interactions, could also be 
included for the searching of related kinetic or thermo- 
dynamic phenomena. The variance-covariance matrix 
will ensure the consistent consideration of different 
classes of basis functions, and provide the measure of 
their relative importance (after orthogonalizing the ba- 
sis functions based on g^ v ). In addition, the number 
of independent basis functions is also limited by the fi- 
nite size of X, thus more selected basis functions will 
be discarded after considering the matrix g^, then do 
not significantly contribute to final results. Therefore, 
the errors due to finite basis functions might be relevant 
to the statistical errors of the finite samples. In princi- 
ple, £lx,eq(q) is a (high-dimensional) continuous function 
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in the conformational space q, in practice, it should be 
thought as discrete values of the function in the finite- 
size X sample, {Qx, eq (<?*)}, where {q*}, k = 1, • • ■ , Mj, 
are conformations in the X sample with the size Mx- If 
supposing the conformational space can be divided into 
some meta-stable regions, since both Px(q) and P e q(q) 
are proportion to the equilibrium distribution inside each 
meta-stable region, Qx. eq (q) is almost a step function, 
which is constant inside each meta-stable conformational 
region. Therefore, even no many basis functions in ex- 
pansion of £lx,eq still may provide a correct estimate of 
weights of trajectories based on Eq. (fT2|) . 

The difference between two samples (or the corre- 
sponding probability density functions), such as Px{q) 
and P eq (q), can be measured as 



L 



X,eq 



= <( 



Peq(q) 



~lf)x. 



(13) 



Px{q) 

Here the difference L 2 X eq is non-negative. A nearly 
zero L 2 X eq indicates the equivalence between Px (q) and 
P eq (q), and a larger L Xeq claims the further necessity 
for re-weighting the simulation trajectories. It is worth 
to mention, since L 2 X eq is asymmetric between Px{<T) 
and P eq (q), it is not a normal distance between the two 
probability density functions (or samples) |44| . 

Considering Eq. © , an approximation of L\ eq can be 
written as 



L 



X,eq 



s 



x,e q = Y.3APx)(5xA^) eq (5 x A v ) 



(14) 



with a finite-size set of basis functions {A^(q),jjL — 

X,eq 



1, • • • , n rn \. With enough basis functions included, S 2 



will approach L 2 X , but will never exceed it. Since L 2 X 
is usually undeterminable, the reasonable method to as- 
sess the completeness of basis functions is to observe the 
behavior of S x eq itself. If enough basis functions in cer- 
tain class, such as the functions of the dihedral angles 
of a protein molecule, have been included in calculation, 
S X eq should no longer increase with the number of basis 
functions, showing a saturation behavior. The reasoning 
is the same when multiple classes of basis functions are 
included. 

To calculate S x eq , the sample for equilibrium distri- 
bution P eq (q), thus the trajectory weights {wi}, are nec- 
essary. We propose two methods to bypass this diffi- 
culty. First, before re- weighting, P eq {q) can be approx- 
imated by the trajectories with equally weighting, i.e., 
all {wi} are set to 1 at this stage. With this approxi- 
mation, we roughly select basis functions. Second, after 
re-weighting, S x eq can be calculated with the resulting 
trajectory weights. This procedure can be used to test 
the former selection. If the rigorously calculated S x eq 
has reached its saturation region with previously roughly 
selected basis functions, the results should be satisfiablc. 
However, this posterior examination is restricted to the 
cases that the trajectory weights can be uniquely deter- 
mined. In practice, the distribution by equally weighting 



trajectories, denotes as P, has the same local equilibrium 
characteristics in each meta-stable region as Px and P eq , 

fl x p = Pxfq) 1S similarly a step function, which is also 
constant in each meta-stable region, only the constant 
values are different from those of £lx.eq- Thus we believe 
the a priori test is already enough, and the following re- 
sults are all calculated by setting {wi} to 1. 

Since the statistical errors exist due to the finite sample 
sizes, S X eq would not strictly stop growing with the num- 
ber of basis functions, it just increases slowly compared to 
the initial growing phase with increasing basis functions. 
Similarly, when large error occurs, S x eq may abnormally 
increase within the saturation region. To prevent large 
statistical errors, we design a scheme to get rid of the 
statistically unreliable basis functions. We can orthogo- 
nalize the functions {SxA* 1 ^}, leading to the explicitly 
identifiable value and error contribution from the orthog- 
onalized basis functions to S Xeq . In current paper, we 
implement the Gram-Schmidt orthonogalization method 
to one-by-one derive out the set of orthogonalized basis 
functions {8xA^(q)} , thus 



C2 

°X,eq 



where 



(SxAy eq 



M ((5xA») 2 ) 



(15) 



(16) 



x 



To estimate the error of S x eq , we approximately esti- 
mate the error of s^, and combine them by 



Err(Sl eq ) 



Here, ErrQ denotes the statistical error of the quantity in 
brackets. Err(s^) can be estimated from Err((8xA^) eq ) 

and £>r(((<5x^') 2 )^)j which are the standard error of 
sample averaged quantities, and can be estimated by nor- 
mal statistical methods. For each orthogonalized basis 
function, 8xA^, if Err(s^) is larger than |s M |, i.e., the 
relative error of is larger than 1.0, this basis function 
will not be selected in expansion. One-by-one selection 
will give out the final basis function set and the estimated 
error of the completeness measure S x eq . 

The scheme introduced here provides us the knowledge 
about how many basis functions are enough for WED 
calculation. After orthogonalization, the contribution of 
the basis functions to S x eq clearly reflect their relative 
importance in trajectory re-weighting. The basis func- 
tions that contribute more are more important for por- 
traying the difference between Px (q) and P eq (q) , and the 
ones that contribute quite small can be linearly expanded 
by other existing basis functions. Furthermore, various 
kinds of basis functions can be safely added to investigate 
the related phenomena. For illustration, we will show 
that the basis functions of dihedral angles and those of 



inter-atomic distances can be consistently treated in cal- 
culation, and the effect of solvent on the conformation of 
biological macromolecules can be studied by additional 
basis functions characterizing the solute-solvent relation. 



B. Effects in t 

There is only one free parameter in WED method, 
t* = t/r, for collecting the X samples. Large t* may 
bring systematical error to the estimated equilibrium 
properties, and small t* will reduce the sample num- 
ber in X, leading to a larger statistical uncertainty. 
t* = 0.01 ~ 0.1 is usually applied in the current work 
with satisfiable results obtained. We also point out that, 
for the purpose of discovering the states in conforma- 
tional space, the results are not very sensitive to f*. 

t actually reflects the timescale of the process that we 
are interested in. If it were set to the single trajectory 
length r, then the trajectory weights solved by Eq. (fT2]) 
will be equal to each other. In this case, we are focusing 
on the processes with a timescale much longer than t, 
thus all the trajectories are still in the relaxation to local 
equilibrium, and should have the same weight. Usually, 
if a system can be divided into several metastable states 
under the focused timescale, i should be selected com- 
parable to or smaller than the smallest life time of these 
states. Given that the kinetic states of the system are 
frequently unknown, t should be selected relatively short. 
At the zero limit of t, the WED method turns into the 
classical method described by Eq. (|A1|) with larger weight 
fluctuation. In practice, it is possible to use shorter and 
shorter t and check the convergence of the re-weighted 
free energy surface. Thus, a proper t may be found as 
a compromise between correctness and statistical relia- 
bility (small fluctuation of trajectory weights). Another 
practical consideration is that, a few percent of r would 
be the safe choice for t. This is because the transition 
events with timescales prominently smaller than r have 
reached equilibrium in single trajectory (thus do not need 
re- weighting) , we only need to focus on the ones with 
time-scales similar to or slightly smaller to r. In here and 
former applications, we usually choose t/r in the interval 
of [0.01,0.1], and the results are satisfiable. Supposing 
the longest t— comparable inter-state transition time is 
T m , if r > r m by a large margin, the simulation time 
would be longer than necessary, thus, the trajectories can 
be shortened to realize the considered case (although it 
is not necessary). If r < r TO by a large margin, it is quite 
possible that the conformational space is disconnected 
somewhere. With current simulation data, we can only 
obtain the knowledge of the state structure in conforma- 
tional space, instead of the intact equilibrium properties 
of the system. The considered case will be true for con- 
nected sub-regions in the conformational space. We also 
notice that the state structure of the system is directly 
reflected from the eigenvalues of H matrix, which is not 
sensitive to t selection. Thus, to detect the kinetic states 
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of the system, i need not to be very small. 



C. Tactic for statistics improvement 

For this purpose, we can generate a few trajectories 
from each initial conformation, and specify the same 
weight to the trajectories from the same initial conforma- 
tion. Suppose there are n s initial conformations (denoted 
as gio,Q20, ■ ••, <7n s o in the following), and mi, m 2 , ...,m n , 
trajectories initiated respectively from these initial con- 
formations (the total number of trajectories is still N t ), 
Eq. (fT0|) can be rewritten as 

E(E G ^)^ = ° ( 18 ) 

fe=l 3 

where Wk is the weight for all the ink trajectories initiated 
from (ffcOj T is a N t -by-n s matrix with zero and unity ele- 
ments. Tjk is unity only if the jth trajectory starts from 
the kth initial conformation. Since the number of equa- 
tions is more than the number of trajectory weights, we 
estimate Eq. (fl8l) by minimizing I — \GTw\ 2 . The vec- 
tor of trajectory weights, w, corresponds to the ground 
state of matrix H = T T G T GT. The eigenvalue of the 
ground state should be zero in principle, but in practice, 
it might be a small positive value due to statistical errors. 
For the newly defined H matrix, the number of the de- 
generated ground states still corresponds to the number 
of disconnected metastable states under the timescale of 
the individual trajectory. The existence of a small frac- 
tion of transition trajectories can slightly split the ground 
state of H . We name this method as SI1 in the following 
of this paper. 

Alternatively, we can merge the trajectories with the 
same initial conformations to a single one, and perform 
the standard WED analysis. Concretely speaking, we 
can calculate the average of certain variable in the initial- 
segment sample set, X, by, 

1 n. 

n s ^— ' q '° 

i—l 

where (A) + means the average of A over the initial seg- 
ments of all the trajectories with the initial conformation 
qio. By doing so, more samples are included in calcula- 
tion, and effectively, both the initial region and the whole 
area in the conformational space which the trajectory ex- 
plores become statistically more reliable. This method is 
named as SI2 in the following. We will illustrate the 
application of SI1 and SI2 for depressing the statistical 
noise in the systems with entropic states, where the inter- 
trajectory difference in the same metastable state is large. 
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IV. RESULTS AND DISCUSSION 
A. Alanine dipeptide at room temperature 

For the system of explicitly solvated alanine dipep- 
tide, we generated 500 MD trajectories at 300 K tem- 
perature, each with t = 600 ps length in the NAMD 
package (46j with CHARMM27 force field. In the simu- 
lations, a few potential barriers were added on the dihe- 
dral energy terms of the standard force field to kinetically 
separate the conformational space at room temperature 
(about 300 K). The detailed simulation settings can be 
found in Ref. [42]. The alanine dipeptide molecule, to- 
gether with the initial conformations of simulation are 
shown in Fig. [TJ where various notations of the atoms, 
dihedral angles and free energy wells of this molecule are 
introduced. Here, by selecting different classes of basis 
functions, we will focus on the effects of different phys- 
ical variables on the observed substate structure of the 
system and the improvement of statistical errors. 



1. Dihedral angles as basis functions 

With thorough investigation, the two dihedral angles 
(j) and ijj have been recognized as the major coordinates 
that characterize the conformational motion of alanine 
dipeptide molecule. We adopted the first-to-fourth or- 
der two-dimensional trigonometrical functions of <f> and 
ip as the basis functions in our WED analysis. Calcu- 
lating with this set of basis functions, the 10 smallest 
eigenvalues of H are shown in Fig. [UJa). The smallest 
zero eigenvalue is intrinsic due to the construction of H. 
The second eigenvalue corresponds to the kinetic separa- 
tion between the region with <f> > (containing Cj x and 
ar,) and the region with 4> < (containing Cj q and an). 
Consequently, by projecting vectors {G 1 } to the second 
eigenvector of H (i.e., through {L\}) : the trajectories can 
be clearly classified into the above mentioned two regions. 
The third eigenvalue of H corresponds to the partially ki- 
netic connection between the C^ q and an states. Only a 
fraction of the trajectories in C^ q can climb over the free 
energy barriers to the state, and vice versa. Thus, 
we may use the values to classify the trajectories 

that are confined in the super-state involving Cj q and 
an into three groups, i.e., the non-transition trajectories 
in the two potential wells, and the transition trajectories 
between them. Since the two potential wells are well de- 
fined with their internal relaxation time smaller than the 
transition timescale, we can manipulate the trajectories 
to predict the transition times and locate the transition 
state ensemble [42J. The fourth smallest eigenvalue of 
H is also smaller than 1.0. Although it no longer indi- 
cates new state in the conformational space and should 
be taken as the reflection of statistical difference between 
trajectories, we recently find that, by observing {Lf}, 
two trajectories that transiently walk into the ckl region 



can be picked out. One of the two trajectories initially 
located in the oll region, then quickly moved into C% x , 
another just occasionally visited oil,. Actually, at the 
temperature of 300 K, the oil region seems not very sta- 
ble with the applied force field. The other eigenvalues 
of H are very close to 1.0 without observable meaningful 
information. In the previous paper [42j], we also pointed 
out that the results here can be reproduced only with 
the twelve first-to-second order two-dimensional trigono- 
metrical functions, i.e., the trigonometrical functions of 
if) and <p (without multiplication to any integer), and the 
one-time multiplication between these functions of the 
two adjacent dihedral angles. 



2. Inter-atomic distances as basis functions 

After introducing the major results obtained by ana- 
lyzing with the functions of dihedral angles, we can begin 
to test the case that the inter-atomic distances of ala- 
nine dipeptide are taken as the basis functions. We will 
compare two aspects, i.e., the eigenvalues of H and the 
manipulations on {Lf\ for extracting the kinetic infor- 
mation between C^ q and ctR. There are 10 non-hydrogen 
heavy atoms in alanine dipeptide. Except for the directly 
bonded atoms, 36 inter-atomic distances exist between 
the non-hydrogen heavy atoms. Together with the two 
distances for the possible intramolecular hydrogen bonds 
(see Table H]), we select 38 distances as basis functions. 
The resulting eigenvalues of H are shown in Fig. [SJa). 
While the three smallest eigenvalues are almost the same 
with the results obtained with the functions of dihedral 
angles, the fourth eigenvalue is prominently lifted to a 
value closer to 1.0. However, the discrimination of the 
two a^-visited trajectories is still possible. We plot the 
{L3} values calculated with the trajectories truncated to 
80-percent length (i.e., the 0.2r ending segment of the 
trajectories are discarded in analysis) versus the ones 
calculated with full trajectories [42| in Fig. |2Jd). For 
comparison, the same figure generated with the first-to- 
second order two-dimensional trigonometrical functions 
are shown in Fig. (2[c) . The two figures have quite sim- 
ilar structure. The points corresponding to the non- 
transition trajectories in an and C^ q are plotted as up- 
ward and downward (black) triangles, respectively. The 
points corresponding to the transition trajectories that 
start from an and C^ q are plotted as (red) squares and 
(blue) circles, respectively. The points along the inclined 
dashed lines correspond to the trajectories with only one 
time transition, and the points along the horizontal dot- 
ted lines correspond to the trajectories that do not hap- 
pen transition within the initial 0.8r. Thus the inter- 
section between the dotted and dashed lines correspond 
to a single-transition trajectory with its transition hap- 
pened exactly at 0.8r. These points can be used to pre- 
dict the transition times of the single-transition trajec- 
tories, consequently the transition state ensemble can be 
constructed. All the other points correspond to the tra- 
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jectories with even number of transitions which all hap- 
pen within 0.8t, or with early transitions occurred within 
t, or the multiple transition trajectories with transition 
happened both within 0.8t and after 0.8r. More details 
about detecting transition time thus transition confor- 
mation ensemble from the truncated-trajectory plots are 
presented in Ref. [13] • The similarity between Fig. [2lc) 
and Fig.^d) suggests that the inter-atomic distances can 
also be applied as basis functions for the current system. 



3. Completeness of basis functions 

When the functions of dihedral angles and the inter- 
atomic distances are both included as basis functions, we 
get similar results on the eigenvalues of H and the pro- 
jection values. This property can be predicted by S\ . 
The S\ eq calculated with different number of basis func- 
tions are plotted in Fig. [2(b). Two curves are shown. In 
one of the two curves, the first-to- fourth order trigono- 
metrical functions are taken into account before the 38 
distances, the order between these two classes of basis 
functions is reversed in another curve. As can be seen, 
the two curves reach the same S% within statistical er- 
ror. Apparently, before changing to the other group of 
basis functions, the functions of dihedral angles account 
for larger value of S\ eq than the inter-atomic distances. 
This is because the inter-atomic distances are not the in- 
trinsic coordinates of the system, although they can also 
be adopted to detect the various states in the conforma- 
tional space, but the characterization is not so definite 
as the dihedral angles do. Thus, even the distances has 
been taken into account, there is still independent and 
informative part in the trigonometrical functions, lead- 
ing to the further increase of S\ . Instead, when the 
trigonometrical functions have been considered, the dis- 
tances become non-informative (due to the orthogonal- 
ization process), this explains the fact that lots of basis 
functions are discarded by error analysis in this case [see 
the different number of basis functions in the two curves 
in Fig. [2£b)]. From Fig. [2Kb) , we also find that either 
with the first 12 basis functions of dihedral angles (first- 
to-second two dimensional trigonometrical functions of 
4> and ip) or the first 12 inter-atomic distances (the dis- 
tances between main chain heavy atoms separated by two 
atoms, the distances between main chain and side chain 
heavy atoms separated by two atoms, and the distances 
between side chain heavy atoms) , S\ eq has already en- 
tered into the phase with slower increase. This explains 
why the results obtained by 12 trigonometrical functions 
are similar with the results with 40 ones. We also plot the 
eigenvalues of H calculated by 12 inter- atomic distances 
in Fig. [U(a). The fourth eigenvalue is further lifted, con- 
sequently the identification of a^-visited trajectories by 
{Lf} is not possible. However the behavior of {£f } and 
{£f } is qualitatively similar to the previous results (data 
not shown). 

We also try to add the dihedral angle uo (see Fig.[l| into 



analysis, which has been found to occasionally happen 
cis-trans transition at high temperature [4(| ■ By adding 
the sine and cosine functions of u) itself and its correlation 
to </>, no difference is found in analysis. Indeed, in our 
simulation, the conformations are all found to have trans- 
it angles. 

As a summary, we have compared the results of the 
WED analysis with basis functions either of the dihedral 
angles or the inter-atomic distances. The previous results 
obtained with dihedral angles can be reproduced with 
inter-atomic distances. However, both from the spectral 
properties of H and from the S\ eq term, we conclude 
that the distances are not as accurate as the dihedral an- 
gles for describing the structure of current model system. 
The results obtained with the inter-atomic distances may 
get worse when studying larger protein molecules, since 
the selection of characteristic inter-atomic distance be- 
comes more complicated. Thus, for the application of 
WED method to proteins, we suggest to use the functions 
of the dihedral angles as the preferential basis functions. 
For two adjacent main chain dihedral angles, the first-to- 
second order two-dimensional trigonometrical functions 
are the proper choice. If only the correlation between ad- 
jacent dihedral angles were taken into account, the num- 
ber of basis functions would be proportional to the chain 
length of the simulated molecule, which is practical for 
implementation. 



4- The effects of solvent molecules 

For solvated biological macro-molecules, the conforma- 
tional change is sometimes correlated to the surround- 
ing solvent molecules. In WED, since the solvent-related 
properties can also be selected as basis functions, it be- 
comes possible to systematically one-by-one check and fil- 
ter out the variables that may affect the thermodynamic 
and kinetic properties of the solvated molecule. Here, 
for illustration, we study the solvated alanine dipcptide 
system with 91 basis functions to search for the solvent- 
related effects. These basis functions are classified into 
three groups, which respectively reflect the internal free- 
dom of alanine dipeptide (denoted as P), the solute- 
solvent relation (denoted as PW) and the structure of 
bulk solvent (denoted as W). The details of these basis 
functions are listed in Table U The eigenvalues of H are 
shown in Fig. [3ja). As can be seen, at 300K tempera- 
ture, the inclusion of the solvent-related basis functions 
does not affect the spectral property of H. Particularly, 
the eigenvalues that are prominently smaller than 1.0 do 
not change, reflecting the unchanged physical division of 
the conformational space. Only a few eigenvalues pretty 
close to 1.0 are slightly lowered due to the added ba- 
sis functions. The above results suggest that, for the 
current system, the solvent related properties have al- 
most reached equilibrium in single trajectory. Conse- 
quently, these properties contribute quite a little to the 
inter-trajectory difference, and are unlikely to be strongly 
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coupled with the dynamic process of the solute molecule 
under the simulated timcscalc. 



B. Alanine dipeptide at low temperature 

We also apply the WED method to the low- 
temperature simulations of the solvated alanine dipep- 
tide. First, the 500 conformations selected from a 600 K 
simulation in the NAMD package [46| are first relaxed by 
simulating at 150 K for 200 ps, then continuously simu- 
late r = 800ps, and collect conformations every 0.2 ps for 
subsequent analysis. Standard CHARMM27 force field is 
adopted in all the simulations. 

We note that, at this temperature, the physical phe- 
nomena are quite complicated; the force field and the 
simulation procedure that we adopt may not reflects the 
physical reality. Here, we are only meant to illustrate 
the application of the WED method in the extreme con- 
dition, instead of providing physical insight into the sys- 
tem. 

The eigenvalues of the H matrix are calculated with 
the 91 basis functions listed in Table [l] and are plot- 
ted in Fig. [Hb). Different from the results at 300K, at 
T = 150K, there are lots of eigenvalues deviating from 
1.0, which gradually grows from zero to larger values. Be- 
sides, the solvent-related quantities do affect the eigen- 
values of H now, leading to even more small eigenvalues. 
Correspondingly, the S\ , as shown in Fig.[3]Jd), almost 
always increases with the number of basis functions. At 
such a low temperature, the free energy surface becomes 
quite rough, the diffusive dynamics 47[ in the conforma- 
tional space slows down seriously. As a result, the sim- 
ulation trajectories may only explore a small region in 
the conformational space. Some degrees of freedom, like 
the solvent motion around the solute molecule and the 
conformational motion perpendicular to the (f>-ip plane, 
are no longer equilibrated in single trajectory. Thus, the 
inter-trajectory difference becomes larger, more states 
are detected automatically, and more basis functions are 
required to expand the rugged distribution of the sam- 
ples, leading to the abnormal behavior of the eigenvalues 
of Hand S^ eq . 



Manipulating the eigenvectors of H can provide fur- 
ther information of the simulation data. The projections 
of {G 1 } to the second and the third eigenvectors of H 
pick out two completely separated states, Cf x and the 
super-state containing an and Cj 9 [Fig. 0Ja), a t current 
temperature, the <fi and ip angles may not be enough to 
characterize the conformational motion, thus the nota- 
tions, C$ x , an and C^ q , are used mainly for the purpose 
of illustration]. By the aid of the histogram of {L 3 } [see 
Fig. Ufa), inset], we can roughly classify the trajectories 
inside the super-state into three groups, i.e., the trajec- 
tories in the leftmost two bins, in the rightmost two bins 
and the remaining ones. The conformational distribution 
of trajectories in different groups are shown in Fig. SJc). 
The two groups of the leftmost or rightmost bins both 



possess single peak distribution, either in C^ q region or 
an region, just like the non-transition trajectories iden- 
tified in the simulations at higher temperature. In con- 
trast, the trajectories in the remaining group have a sig- 
nificant occupation in the intermediate region between 
Cj q and oir. Obviously, these trajectories are trapped at 
the intermediate region due to slow dynamics, and are 
different from the transition trajectories identified with 
the similar procedure at 300K. We plot the {L 3 } cal- 
culated with 80 percent trajectory length with the ones 
with full trajectory length in Fig. [He). The behaviors 
presented in Fig. [5] no longer exist. In terms of kinet- 
ics, the transition between the two regions, olr and Cj q , 
can neither be approximated as Markovian process nor 
described with first-order rate constant any more. Con- 
sequently, with the current simulation data, it is only 
possible to reconstruct the intrastate equilibrium distri- 
bution in the local regions of conformational space, or to 
study the more localized kinetic behavior. 

Due to the numerous local minimums at the low tem- 
perature, the cxr and C^ q regions are just like two en- 
tropic states. We also test the statistics improvement 
method in this system. With one-fifth initial conforma- 
tions and five trajectories from each conformation, the 
inter-trajectory difference, thus the statistical noise, is 
indeed suppressed as shown in Fig. SJd) • 



V. CONCLUSION 

In the current paper, we apply the WED method to 
a few extremal model systems, including a system with 
multiple-state transition network, a system with entropy- 
dominated state, and the solvated alanine dipeptide sys- 
tem. The solvent-related basis functions and the low- 
temperature properties of the dipeptide molecule are also 
studied. For the first two examples, WED can correctly 
resolve the topology of the transition network. For the 
alanine dipeptide system simulated at 300 K with modi- 
fied potential, we find the redundant basis functions (in- 
cluding functions of both the dihedral angles and the 
inter-atomic distances) are consistently treated in the 
WED analysis, and the inclusion of the solvent-related 
basis functions gives out the expected results. At pretty 
low temperature of 150 K, the roughness of the free en- 
ergy surface is reflected in the results of WED. The con- 
formational space can still be roughly divided into parts, 
however, the kinetics becomes too complicated to be un- 
ambiguously characterized. As a systematic method for 
exploring the conformational space, WED does not re- 
quire much priori knowledge and approximation, thus its 
results can honestly reflect the thermodynamic and ki- 
netic information of the system. For biological macro- 
molecules, the usually expected hierarchical structure 
can be extracted out step-by-step. For these complex 
systems, the structure of the conformational space may 
be much more complicated, these examples studied here 
become quite relevant for physically interpreting the re- 
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suits. Along with the applications, we design the S\ eq 
quantity for checking the completeness of the selected ba- 
sis functions, and the schemes for statistics improvement. 
These tactics are also proved to be helpful for portraying 
the conformational space. 
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APPENDIX A: TRAJECTORY- REWEIGHT 
SCHEME 



Direct weighting of trajectories from analyzed 
distributions 



Suppose we already know the initial distribution, 
Pinit{o)i of the simulation trajectories, it is possible to 
assign each trajectory a weight 



Pinit (q) 



(Al) 



where qio is the initial conformation of the ith trajec- 
tory. If Pinit(q) is the summation of a finite number 
of (5-functions, this method leads to trajectory weights 
with singular value. However, if the initial conformations 
are selected from a known distribution Po(q), Pinit (q) 
can be approximated by Po(q) to avoid the singularity. 
For example, suppose the simulation system has the po- 
tential energy V(q), and the initial conformations are 
selected from the simulation with the potential energy 
U(q), which might be a more flat version of V(q) for im- 
proving the sampling efficiency, then the weight of the ith 
trajectory should be Wi = exp[/3£/(<f i0 ) — PV{qio)}. This 
weighting scheme is actually proper for trajectories with 
arbitrary length r. In the limit of t — ► 0, it reduces to the 
classical scheme for regenerating the desired distribution 
from a simulation with a modified potential[24J. How- 
ever, since the energy fluctuation for a system with par- 
ticle number N$ is approximately proportional to y/Ns, 
for large systems, the weights in Eq. (|A1|) suffer from 
huge fluctuation. It is quite possible that the weights of 
a few trajectories prominently outweigh the others, then 
only the samples in these trajectories contribute to the 
final results, while the significance of the other trajec- 
tories has been depressed to nearly zero. Besides, the 
substitution of the initial distribution function combined 
by 5-functions by an analytical distribution is only valid 
when a large number of trajectories are generated for 
analysis. 



2. Weighting of not-so-short trajectories 

One should notice that there are actually more than 
one appropriate weighting schemes for the same set of 
simulation trajectories. Taking the long time limit of 
the trajectory length t to ensure that every trajectory 
has been in equilibrium over the whole conformational 
space, then the trajectories should be assigned with equal 
weights. This choice brings the largest effective sample 
size, thus is better than Eq. (|A1[) . When the simulation 
time is neither zero nor long enough, it is plausible to 
expect that the proper weights for the trajectories will 
gradually relax from the ones in Eq. (|A1|) to the case 
of equal weights as simulation time increases. This con- 
jecture can be explained with probability evolution. As 
we all know, the relaxation from Pi n it{q) to P e q(q) is a 
process with multiple timescales, some events may relax 
faster than the others. Usually, the temporal hierarchy 
is highly correlated to the conformational degrees of free- 
dom, i.e., the relaxation processes with short timescales 
are related to local conformational adjustment, and the 
ones with long timescales are related to global confor- 
mational transformation. Consequently, if each r-length 
trajectory is able to reach local equilibrium within cer- 
tain conformational region (metastable state), then only 
the overall difference between P, mf (q) and P eq (q) in each 
local state need to be considered in weight estimation, 
making it possible to reduce the fluctuation of trajectory 
weights. 

In practice, within a short time t at the beginning 
of a simulation trajectory, some fast degrees of freedom 
of conformational motion may have relaxed, thus, intu- 
itively, the trajectory has approximately reached equilib- 
rium within certain local area, a, around its initial con- 
formation. If the initial conformations of two trajectories 
are so close in the conformational space, that both of the 
two trajectories explore the same area within t, then the 
difference between the two trajectories in terms of the ini- 
tial conformation should be almost wiped off, leaving the 
two trajectories contributing nearly the same to the equi- 
librium properties. Generally speaking, for the trajecto- 
ries that start and stay in the same area within short t, 
similar trajectory weights should be assigned with promi- 
nently reduced fluctuation. The number of trajectories 
that start in a local area a is actually proportional to 
J a Pinit{q)dq, i.e., the integration of the initial distribu- 
tion over a. However, if the initial conformations are 
selected from the desired equilibrium distribution, the 
number of trajectories that start in this region should be 
proportional to J P eq (q)dq. Consequently, we can spec- 
ify the weight 



f a p eq(q)dq _ / P ini t(q) 



J a Pinit (q)dq \ P eq (q) 



(A2) 



eq,a 



to all the trajectories from a region. Here, () eq , a denotes 
the average over the equilibrium distribution restricted 
in the a region. Considering the plausible local equili- 
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bration of trajectories, Eq. (|A2[) can be reformulated to a 
time-average form as below, 



11 u 




=9i« 



dt 



(A3) 



where u>i is the weight of the ith trajectory, qi(t) is the 
conformation in the ith trajectory at time t. Interest- 
ingly, the time-average formula does not refer to the as- 
sumed local areas any more. The trajectory weights can 
be calculated by averaging the ratio of the two distribu- 
tion functions over the initial t-length of each trajectory, 
and the trajectories will automatically detect the local ar- 
eas. Usually, t should be short compared to r to guaran- 
tee, as far as possible, that the trajectories initially close 
to each other are exploring the same local area which is 
a local equilibrium meta-stable state in the r time scale. 
We will further discuss this issue in Sec. IIII Bl 

In Eq. (|A3|) . due to the average over a few samples in 
each trajectory, the fluctuation of {wi} will be lowered 
to certain extent. However, for large systems, this is 
still not enough. Because the reduction of fluctuation 
is approximately proportional to -Jni, where nj is the 
number of frames within t time, which is incomparable 
with the exponentially expanded fluctuation of ^"'^ffi"- 1 . 



3. Weighting of trajectories based on the 
expansion of probability density 

In view of the above, we propose additional scheme 
to further reduce the fluctuation of trajectory weights. 
For systems like solvated biological macromolecules, the 
large fluctuation usually does not come from the phys- 
ically important degrees of freedom. For example, the 
solvent molecules account for most of the energetic fluc- 
tuation of the system. In the same time, the physically 
interesting degrees of freedom, say the conformation of 
a macromolecule, are determined by its own internal co- 
ordinates, such as dihedral angles, inter-atomic distances 
and so on. Thus, we can expand the term P p^^ only 
by quantities that are physically interesting to get rid of 
the irrelevant fluctuations. To fulfill this aim, we define 
an inner product between two conformational functions, 
A(q) and B(q), as 



(A(q),B(q))= f dqP eq (q)A(q)B(q). 



named as basis functions in the follows of the paper) are 
selected to approximate P p ™ u ^ as 



Pinit (q) 
PeM 



£ 9^ {P eq )(5 eq A») mit 6 eq A» (q) . (A6) 



Here g^ v {P eq ) is the inverse of the covariance matrix 
of the basis functions, g liU {P eq ) = {5 eq A ll 5 f , q A v ) eq , and 
S eq A(q) = A(q) - (A(q)) eq . Substituting Eq. (TMJ) into 



(A4) 



Eq. (|A3[) . the trajectory weights are found to satisfy the 
following self-consistent equations. 



w t = ll + Y J 9nAPe q )(8 eq A») mit {8 eq A v (q)). l+ \ . 

(A7) 

Here, the time average of the function 5 eq A il (q) over 
the initial i-length segment of the zth trajectory has 
been written as (SeqA^)^. For any function of con- 
formational coordinates, the average over the distribu- 
tion function, Pi n it(<T), can be calculated with the sam- 
ples that are adopted for selecting the initial conforma- 
tions. The average over distribution P eq (q), however, re- 
lies on the trajectory weights through Eq. Conse- 
quently, an iterative method should be invoked to calcu- 
late {wt}, (i = 1, . . . , N t ). Since J p" it (^ is now projected 
to the hyperplane spanned by the selected basis functions 
{5 eq A fl (q)}, the calculated trajectory weights can be used 
to reconstruct the free energy surface along the selected 
degrees of freedom. 

Eq. (|A7| is already practical for getting reasonable tra- 
jectory weights with small fluctuation. However, there 
are still two drawbacks of this method. First, we need 
enough samples of the distribution Pi n a (q) . If the initial 
conformations are chosen from high temperature simu- 
lation, then all of the conformations in the high tem- 
perature trajectory can be utilized. However, if the ini- 
tial conformations are chosen from coarse-grained simu- 
lation, or simply from arbitrary selection, the estimation 
of (A(q)) init -]^J2iA(qio) might be insufficient if N t 
(the number of trajectories) is not large, leading to prob- 
lematic application of Eq. (|A7[) . Second, the iterative cal- 
culation hampers the in-depth analysis of the simulation 
system. When the conformational space has been sepa- 
rated into different free-energy wells without any inter- 
communication within the finite simulation time, the iter- 
ative method may still provide an incorrect answer with- 
out pointing out the broken ergodicity. 



Considering the general relation 

■PM 



PM m ), = {m)l 



(A5) 



for any conformational function A(q) [where (A^)^ de- 
notes the ensemble average of A(q) under the probability 
density Pk(q)], a set of physical quantities {A^(q)} (also 



4. Weighting scheme based on a linear equation: 
WED 



Considering the shortcomings of Eq. (|A7[) . we design 
the final framework of WED, which, to a large extent, 
solves the above mentioned problems, and is still able 
to find out the fluctuation-reduced trajectory weights. 
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Here, instead of using the sample of Pinit(q), we artifi- 
cially construct a new sample set X with its distribution 
symbolically denoted as Px(q)- X consists of the ini- 
tial t-length segments of all the trajectories. If t is short 
enough (compared to r), the trajectories only explore 
their local area within t. Thus, the number of conforma- 
tions in each local area, a, does not change after the t 
relaxation, i.e., 



J a Pinit(q)dq 
J a Px(q)dq 



const., Va. 



(A8) 



Then, substituting Pi n u (q) with Px (q) in calculation, the 
expression of the trajectory weights will become 



ii', 



Px(q) 



\q=q z (t)dt 



P 



(£>/ 



X 



(A9) 



Since the sample set X is supposed to reach local equi- 
librium in each a region (but may not reach equilibrium 
among these regions), we are able to get Eq. (|A9j) with- 
out the inverse sign in Eq. (|A3[) . After redefining the 
inner product between two conformational functions as 



(A(q),B(q)) 



dqP x (q)A(q)B(q), 



(AlO) 



the 



term in Eq. 



Px(q) 

basis functions similar to Eq 



9J) can be expanded by a set of 
1. 



Px(q) 



1 + Y / 9 tl v(Px)(S X A>*(q)) eg 6 x A>'(q). (All) 



Finally, similar to Eq. (TAT)) , substituting Eq. (jATTl) 
into Eq. (|A9|) , and considering the reliance of the average 
over equilibrium ensemble on the trajectory weights, we 
have the following set of linear equations, 



IV; 



1+ E \ ^,,g,APx)(S X A^q)) l+ (SxA^q)) J 

(A12) 



where Yl% w i = N. It is direct to derive the central 
equation of WED, Eq. $T2$, from Eq. (jAT2|) . 



APPENDIX B: HEURISTIC DERIVATION OF 
THE SPECTRAL PROPERTY OF H 



We also provide a heuristic derivation of the spectral 
property of the H matrix. We define the element of a 
matrix T as, 



r, 



l 

~N~t 



-(1 + ^,u9^{Px){S X A^q)) i+ (5 x A"(q)) j+ ), 

(Bl) 

where i+ (or j+) represents the short initial segments 
(t-length) in the ith( or jth) simulation trajectory. If t is 
short enough that most of the trajectories are still con- 
fined in their local area, and in another sense long enough 
that local equilibrium is almost obtained, the elements of 
the matrix T can be expressed as 



1 

N 



P, 



(i) 



Px 



(B2) 



Here Px is the distribution function of the X sample set, 
s(i) represents the state in which the initial segment of 
the ith trajectory locates, and P s ii) is defined as 



Px(q)Xs(i)(q) 
J dqP x (q)Xs(i)(q) 



(B3) 



where Xs(i)(o) is the characteristic function of state s(i), 
which is non-vanishing with value 1 only inside the s(i) 
region. Thus, Ey is not zero only if s(i) and s(J) are 
the same state. Suppose that there are n states in the 
system, i.e., si, S2, . . . , s n , and there are mi, m.2, . . . , m n 
trajectories initially locating in these states, respectively. 
The matrix T becomes a block diagonalized matrix with 
proper numbering of the trajectories. 
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i i 

mi mi 

1 ' . 

tti] 



1 1 

mi mi 



1 

mi 

1 

mi 



1 

mi 



1 

m 2 

1 

m 2 



1 

m 2 



1 

m 2 



1 

m 2 



1 

1712 
1 

m 2 



1 

m 2 



The dimension of the sub- matrix with elements — is m*. 

mi L 

For this matrix, the eigenvalue 1.0 is n fold degenerate 
and the eigenvalue is N t —n fold degenerate. The matrix 
defined as H' = (T — I) T (Y — I) has n eigenvalues equal to 
(equal to the number of states) and N t — n eigenvalues 
equal to 1.0. Where J is a iV t -dimensional unit matrix. 
Another matrix T is defined as 

f« = ^-(1 + Z^g,APx)(5 X A»(q)) l+ (6 x A»(q)) 3 ). 

(B4) 

The only difference between T and T is that the average 
over the initial segments of trajectory j has been substi- 



tuted into the average over the whole trajectory. Con- 
sequently, the possible transition events in trajectory j 
will be reflected in the matrix elements, and T is a non- 
symmetric matrix now. For the simplicity in derivation, 
we suppose that there are two states, s\ and si, in the 
system, with mi and mi trajectories initially locating in 
these two states, and there are x\ (or x 2 ) transition ones 
amid the m\ (or mi) trajectories. Then the matrix T 
should be written as 



l l 

mi mi 



1-01 



mi 



mi 



P*2 

mi 



1 1 

mi mi 



mi mi 
m 2 



1-fei 
mi 

m 2 



1 

m 2 



1 

m 2 



mi 

J_ 

m 2 m 2 



mi 
l-/3x. 



m 2 



0*2 1 1 
m 2 m 2 m 2 



1 

m 2 



l-/3i 



l-/3x 



Here, (or /%) denotes the time proportion of the si (or 
S2)-started trajectory i in state S2 (or s\). For the trajec- 
tories that start from the same state, the row vectors of 
r are the same. Thus, the transition and non-transition 
trajectories can not be differentiated by the row vectors 
of r. For systems with multiple states, the conclusion is 
similar. In WED, the matrix G = {G 1 , G 2 , . . . , G Nt } T is 



directly related to T as 

G = f - I. (B5) 

Hence, the row vectors for the trajectories from the same 
initial state are no longer the same, the deviation of the 
transition trajectories from the non-transition ones are 
linearly related to the distribution of the transition tra- 
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jectories in different states. For the matrix H = G T G, 
considering the perturbation of the transition trajecto- 
ries to the matrix G, the number of the eigenvalues that 
are obviously smaller than 1.0 should still be the same 
with the number of the disconnected states in the con- 
formational space. The value of these eigenvalues should 
deviate from zero due to the transition trajectories. The 
other interesting properties can also be rationalized by 
this heuristic proof. 



APPENDIX C: WED FOR MULTIPLE-STATE 
TRANSITION NETWORK AND 
ENTROPY-DOMINATED STATES 

We illustrate the application of WED in two ex- 
terme model systems, one with a transition network 
among multiple states, the other with an entropy- 
dominated state. The first model system which with 
Two-Dimension-Quad- Well(TDQW) potential has four 
degenerate states [four potential wells with exactly the 
same shape, see Fig. EJa)], and four potential barriers 
of equal height separating them apart. The second sys- 
tem which with Mexican-Hat (MHAT) potential has two 
states [see Fig. EJb)]. The outer state has large entropy, 
thus, its intrastate relaxation time may be comparable 
to the time-scale of inter-state transition. We will first 
analyze the WED trajectories of these two systems under 



different temperatures. Then, we will discuss the topics 
about basis function selection, t selection and statistics 
improvement. 



1. TDQW and MHAT systems 

For the two-dimensional systems, TDQW and MHAT, 
simulations are restricted in the square region of [— 2, 2] x 
[—2,2]. Reflecting boundary condition is imposed. The 
over-dampened Langevin equations 



dx 
~d~t 

dt 



l dU(x,y) 
7 dx 

l dU(x,y) 
7 dy 



l 2k B T 



7 



'2k B T 



7 



(CI) 



is adopted to generate the dynamical trajectories. Where 
7 is the frictional coefficient, kg is the Boltzmann con- 
stant, T is the simulation temperature, £i(t) and ^(t) 
are the white noises satisfying = — 0% 

with () denoting the ensemble average of noise. We sim- 
ply take ks and 7 as unity to get the dimension- reduced 
units for time, position and temperature. 
The potential of TDQW is 



J 



U(x,y) 



00 , |x| > 2 or \y\ > 2 

5(§x 2 -l) 2 + 5(|?/ 2 -l) 2 , |z|<2 and \y\ < 2 



r 



The potential of MHAT is 



U(r) = 40 



, 1 6 2,1 
( — r 



6 * 4 1 ± 2\ 2 2,2 

r -\ — r ), r — x + y 



-21 9 ' 3 

For both systems, 900 initial conformations are ran- 
domly selected if not further mentioned, then WED sim- 
ulation are performed, with each trajectory simulated 
for 50 dimensionless time units. The coordinates are 
recorded every 0.01 time units. The t/r is selected as 
0.04, if not further mentioned. 



2. Multiple-state transition network 

For TDQW system, WED simulations are performed 
at three temperatures of 0.30, 0.85 and 1.50. 900 trajec- 
tories with t — 50 dimensionless time units are gener- 
ated in each set of WED simulation. The eigenvalues of 
H are plotted in Fig. [6fa). As expected, there are four 
zero eigenvalues at temperature 0.3, implying the four 
disconnected states in TDQW system. At higher simula- 
tion temperatures, the second to fourth eigenvalues of H 



deviate from zero to larger values due to the increasing 
fraction of the inter-state transition trajectories. More 
illustrative perspective can be obtained from Fig. [6jb) , 
(c), and (d), where the trajectories are mapped into the 
three-dimensional space by projecting {G 1 } to the sec- 
ond, third and fourth eigenvectors of H (the projection 
to the first eigenvector is always zero; the projection val- 
ues are denoted by L l a = G 1 ■ u a , where u a is the ath 
eigenvector of H). At temperature 0.3, the trajectories 
agglomerate into four disconnected clusters that locate 
respectively at the four vertexes of a tetrahedron. Inside 
each cluster are the trajectories isolated in the same po- 
tential well. At temperature 0.85, part of the trajectories 
can climb over the potential barriers to the other states. 
As a result, the points corresponding to the transition 
trajectories are found to gather along the straight lines 
that shoot out of the vertexes, and the ones correspond- 
ing to the non-transition trajectories are still locating on 
the vertexes. The transition trajectories along a straight 
line are found to connect the same two states. These two 
states respectively correspond to the vertex that emits 
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the line and the vertex that the line approximately points 
to. Thus, between the two trajectory-connected vertexes 
(states), there are two parallel lines, respectively corre- 
sponding to the transition trajectories that start from 
different states. In Fig.[6fc), each vertex only connects to 
two neighboring vertexes, correctly reflecting the topol- 
ogy of the transition network of TDQW system. At tem- 
perature 1.5, it is relatively easier for the trajectories to 
transfer between different states. The tetrahedron struc- 
ture is still preserved, and its interior is filled by multiple- 
transition trajectories which transition among more than 
two states. The eigenvalues of H actually provide clues 
for the preserving of the tetrahedron structure. Since the 
second to fourth eigenvalues are still not close to 1.0, i.e., 
the conformational space can not be sufficiently explored 
by single trajectory, the difference between trajectories in 
terms of their initial conformations still exists, leading to 
the structure in the projection space. At higher temper- 
ature, the inter-trajectory difference will further reduce, 
leading to the diminishing of the tetrahedron structure. 
In that case, it is more proper to assert that there is al- 
most only one state in the conformational space in the 
current simulation timescale (however, each single tra- 
jectory might be not sufficient to reach the global equi- 
librium and the weights of trajectories are still necessary 
to reproduce the global equilibrium due to the fact that 
exists three eigenvalues in the (0,1) interval). In shorter 
time scales, e.g., when each of the trajectories is short- 
ened, the substate structure could emerges. 

Considering the tetrahedron structure and the straight 
lines in Fig. [6jc) , and the small second to fourth eigenval- 
ues of H, various kinetic information can be directly ex- 
tracted from the simulation data of TDQW at T = 0.85. 
The procedure can be (1) arbitrarily picking out the non- 
transition and transition trajectories in two neighbor- 
ing states from Fig. Etc), (2) analyzing these trajectories 
with the WED method (constructing H and analyzing its 
spectral properties), (3) projecting the trajectories into 
low-dimensional space and manipulating the projection 
values. In the paper, we just focus on the abilities of 
WED in detecting meta-stable structure, the construc- 
tion of equilibrium properties of the system based on the 
obtained weights of trajectories is directed [13]. Some 
interesting information, such as the transition state en- 
semble, the first order kinetic rates, and the transition 
times of single trajectories, can be obtained. The pro- 
posed kinetic analysis [42| is based on the assumption 
that most of the transition trajectories only climb the 
potential barriers a few times. If, in each single trajec- 
tory, the transition events that we are interested in can 
happen for many times {i.e., a single trajectory almost 
already reached equilibrium) , we directly treat whole the 
visited region as a super states, or perform a shortening 
of the simulation trajectories (for example, dividing each 
trajectory into a few shorter ones) to analyze the detailed 
kinetics in shorter time scales. The WED method pro- 
vides much flexibility for such manipulations. 



3. Entropy-dominated states 



For MHAT system, WED simulations are performed 
at four temperatures of 0.05, 0.30, 0.85, and 1.50, with 
the same settings as the TDQW system. The 70 small- 
est eigenvalues of H are shown in Fig. [7]Ja). Different 
from TDQW, the eigenvalues of H gradually increase 
from zero at the temperature of 0.05, and lots of small 
eigenvalues exist. The number of the eigenvalues that ob- 
viously deviate from 1.0 are always more than the number 
of physical states in MHAT system, till the temperature 
is high enough that there is only one zero-eigenvalue ap- 
parently different from 1.0. This behavior simply reflects 
the entropic nature of the outer potential well. At low 
temperature, a single trajectory can not reach local equi- 
librium in that entropic state, consequently a few small 
eigenvalues exist due to the inter-trajectory difference (it 
looks like there are many sub-states). At higher temper- 
ature, the trajectories that start from the outer poten- 
tial well usually climb over the potential barrier before 
reaching equilibrium in the whole state, thus the inter- 
trajectory difference will not be extinguished, until the 
two states in MHAT kinetically become one state at high 
enough temperature. The projections of {G 1 } onto the 
second, third and fourth eigenvectors of H are shown in 
Fig. Elb), (c), and (d). At both temperatures 0.05 and 
0.30, the trajectories are divided into two groups. In one 
group, the trajectories dispersively locate on the same 
plane which is perpendicular to the L\ axis. These tra- 
jectories are found to be isolated in the outer potential 
well. In the other group, the trajectories concentrate to 
nearly one point. They are found to be isolated in the 
inner potential well. These two groups of trajectories are 
clearly differentiated by the projections along the sec- 
ond eigenvector of H, i.e., the one indicating the most 
important contribution to ergodicity broken. At tem- 
perature 0.05, each trajectory only explore very limited 
region in the entropic state, leading to the distribution 
on the perimeter of a circle. At temperature 0.30, since 
each trajectory can explore larger area, the circle has 
been filled with points. At temperature 0.85, the trajec- 
tories are found to locate in a cone. The points in the 
bottom surface and the vertex of the cone correspond 
to the non-transition trajectories inside the outer and 
inner potential wells, respectively, and the other points 
are transition trajectories between the two states. Once 
again, the trajectory-projected space clearly provides the 
topological information of the transition network of this 
system. 



Interestingly, due to the symmetry of the outer state 
of MHAT, we can even extract the kinetic information 
of the transition between the inner and the outer states 
(data not shown), although it might not be general for 
other systems with entropy-dominated states. 
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4. Completeness of basis functions 

At temperature 0.85, the S\ eq analysis is performed 
for the two-dimensional systems. We apply the following 
two-dimensional trigonometrical functions 

sin [(to + n)x], cos[(m + n)x], 
sin [(to + n)y], cos[(m + n)y], 
m + n > 
sin (mi) sin (ny), sin (mx) cos (ny), 
cos (mx) sin (ny), cos (mi) cos (ny), 

m>l,n>l (C2) 

as basis functions. Where, (ac, y) are the two-dimensional 
Cartesian coordinates multiplied by a transformation fac- 
tor 7^, to and n are non- negative integers. We define the 
summation of to and n in Eq. (|C2[) as the order of these 
functions, and adopt the first-to-fourth order functions 
(totally 40 basis functions) in our analysis. Based on 
the error analysis mentioned in Sec lIII Al three and two 
basis functions are discarded in the TDQW and MHAT 
systems, respectively. For both of the two systems, S\ eq 
initially grows fast with increasing number of basis func- 
tions, then the growth slows down and shows a saturation 
behavior of S\ eq as expected [see Figs. |8ja) and (b)]. 
Once S\ eq has approximately reached the platform re- 
gion, more basis functions will not change the calculation 
results any more (data not shown). Thus, for TDQW and 
MHAT systems, about 24 basis functions, i.e., the first- 
to-third order two-dimensional trigonometrical functions, 
are already enough. 



5. The effects of t 

We study the effect of t with the simulation data of 
MHAT at temperature 0.85. Since there is only one zero 
eigenvalue of H, the whole free energy surface can be 
reconstructed. The whole conformational space is uni- 
formly divided into small pieces, k = X,...,N p (N p is 
selected as 1600 here, i.e., 40 segments along each di- 
mension), and the distribution probability in each piece 
is reconstructed with different t* — t/r. We define the 
function 5 diet (t*) as 



(**) 



\ k=l 



Here Pk(t*) is the distribution probability in the kth 
piece which is reproduced by WED method using t*. 
ptheo j g ^ ne theoretical distribution probability in the kth 
piece. As shown in Fig. [S£c), 5dist(t*) indeed decreases 
with shorter t* in calculation. For t* in the [0.01,0.1] in- 
terval, the 6dist(t*) values are almost the same, and are 
prominently smaller then the value of the non-weighted 
distribution of the simulation data. In the non-weighted 
(all Wi is unity) distribution, the major deviation from 
theoretical distribution resides in the inner potential well 
of MHAT, where the distribution probability is too high 
for the non- weighted data. In the re- weighting process, 
the trajectories that start from the inner potential well 
are usually specified lower weights [see Fig. [8jc), inset], 
thus the over occupation in the inner state can be off- 
set. The other variation in trajectory weights further 
adjust the intrastate distribution for both the inner and 
the outer states. 



to measure the difference between the reconstructed 
probability distribution and the theoretical distribution. 



6. Statistics improvement 



We also test the statistics improvement methods, SI1 
and SI2. In here and below, we always randomly select 
one-fifth of the initial conformations, then five trajecto- 
ries are spawned from each conformation to test the im- 
provement. At T = 0.85, we apply both SI1 and SI2 to 
reconstruct the energy surface of MHAT. From Fig.[8jc), 
it can be seen that both methods correctly reproduce the 
energy surface with even smaller deviation from the the- 
oretical one, as compared to the standard WED method. 
At T = 0.05, the eigenvalues of the H matrix are cal- 
culated with SI2 method. As shown in Fig. [51(d), no 
matter one trajectory or multiple trajectories are gener- 
ated from one initial conformation, the eigenvalues cal- 
culated by the standard WED method are almost the 
same. However, analyzing by SI2 method significantly 
alters the structure of eigenvalues. By generating more 
trajectories from single initial conformation, and combin- 
ing these trajectories together in analysis, the statistics of 
single trajectory is prominently improved, and the inter- 
trajectory difference for trajectories in the same state 
(or near in conformational space) are reduced to large 
extent. As a result, some of the eigenvalues increases, 
in particular for the ones deviating but close to 1.0. In 
other words, the SI2 method has depressed the statistical 
noise in the spectral analysis of H , leaving the physically 
relevant states recognized. 
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TABLE I: Basis Function List 



Group 


Number 


Function 


Internal freedom 
of alanine dipeptide 


40 

8 


Two-dimensional trigonometric functions of <j> and ip. 

Nl - 01, N2 - 02, N1H - Ol, N2H - 02 and their squares*. 


Interaction between 

solvent 

and solute 


2 
8 

O 
L 

l 

20 


Interaction** between peptide and water. 
Interaction between Ol, 02, Nl, N2 and water. 
Water number around peptide (3.3^4, 5.5^4). 
Solvent accessible surface area. 

Water number around Ol, 02, Nl, N2 (3.3A, 3.7A, 4.1 A, 4.5A, 4.9A). 


Solvent structure 


10 


Integration of the OO radial distribution function, g(r), of the bulk 
water***, i.e., r 2 g(r)dr (r° is selected to be 2.9A, 3.3A, 3.7A, 4.lA, 
4.5A, 4.9A, 5.3A, 5. 7 A, 6.1 A and 6.5 A). 


* The four atoms 01, 02, Nl and N2 are labelled in Fig. [Q left panel. N1H(N2H) is the hydrogen atom 



bonded to N1(N2). Ol-Nl and 02-N2 may form intra-molecule hydrogen bonds, or may be connected by 
water bridge [4q |. 

In here and below, interaction means the electrostatic energy part and the Van de Waals energy part. 
These functions reflect the packing of bulk water. 




FIG. 1: (Color online) Illustration of alanine dipeptide molecule and the initial conformations of WED simulation. The 
alanine dipeptide is shown in the left panel, with some dihedral angles and atoms labeled. The initial conformations projected 
to the <f>-ifj plane are shown in the right panel. The background is the free energy surface at T = 600 K, with the free energy 
wells labeled. 
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FIG. 2: (Color online) WED analysis with the basis functions of dihedral angles and inter-atomic distances. Shown are the 
results for solvated alanine dipeptide, which is simulated at T = 300 K with the modified force field, (a) The eigenvalues of 
H matrix calculated with different sets of basis functions, including the full set of inter-atomic distances (squares), 12 inter- 
atomic distances (upward triangles), first-to- fourth order trigonometrical functions of cj> and ip (circles) and first-to-second order 
trigonometrical functions of <p and ip (stars), (b) The behavior of Sx,eq i s shown either with the inter-atomic distances(Distance- 
Angle) at first, or with the trigonometrical functions of <f> and ip at first (Angle-Distance). The {L^} values calculated with 
truncated trajectories (80 percent length) versus the ones calculated with full trajectories are shown. The results are obtained 
either with the first-to-second order trigonometrical functions of (j> and ip (c), or with the inter-atomic distances (d). The arrows 
label the intersection points between the dashed and dotted lines. 
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FIG. 3: (Color online) The spectral properties of H calculated with the basis functions including solute-solvent relation. 
Shown are the results for the system of solvated alanine dipeptide. The Eigenvalues of H calculated with different groups of 
basis functions are shown in (a) (300 K simulation with modified force field) and (b) (150 K simulation with standard forcefield). 
The functions used in calculation are labeled in the legend. P,PW,W means the three sets of basis functions are all included in 
calculation, with an order of internal freedom of dipeptide, solvent-solute relation and bulk water structure in orthogonalization 
and error analysis. See the main text for more details. The behavior of S% e g with increasing number of basis functions are 
shown in (c) (300 K simulation with modified force field) and (d) (150 K simulation with standard force field). The insets of 
(c) and (d) show the part of Sx, eq curve that is related to the functions of the bulk solvent structure. 
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FIG. 4: (Color online) Results for the low-temperature simulation of alanine dipeptide. (a) The projection of trajectories into 
the space of L\ and L\. The inset is the histogram of {L\} values. The different states, into which the trajectories are roughly 
classified, are labeled, (b) The distribution function along ifi angle for the groups of trajectories classified by the histogram in 
(a). Shown are the distributions for the trajectories in the leftmost two bins (downward triangles), the right most two bins 
(upward triangles) and the other trajectories (circles), (c) The values calculated with truncated trajectories (80 percent 

length) versus the ones calculated with full trajectories, (d) The first 70 eigenvalues of H calculated with normal data set (blue 
stars), multiple trajectory data set (one- fifth initial conformations and five trajectories for each conformation) and normal 
method (red circles), multiple trajectory data set and SI2 method (green squares) are shown. 
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FIG. 5: ( Color online) The two-dimensional potentials, (a) The TDQW potential shown as contour map. The darker regions 
correspond to potential wells, and the lighter regions correspond to potential barriers, (b) The MHAT potential projected 
along one-dimension. Rotating around the dashed line reproduces the two-dimensional potential. 




FIG. 6: (Color online) The eigenvalues and projection maps for TDQW potential under different temperatures, (a) The first 
ten eigenvalues of H matrix calculated at different temperatures. For the three temperatures of 0.3(b), 0.85(c) and 1.50(d), 
the projection of {G 1 } to the second, third and fourth eigenvectors of H are also plotted. 



23 



(a) 



CD 
_3 

CO 

> 
c 

CD 
D) 

Lu 



O 




A. A 



T=0.05 
*-T=0.30 
e-T=0.85 

b-T=1.50 




(b) 



1 70 
Numbering of Eigenvalues 

(c) (d) 





FIG. 7: (Color online) The eigenvalues and projection maps for MHAT potential under different temperatures, (a) The 
first seventy eigenvalues of H matrix calculated at different temperatures. For the three temperatures of 0.05(b), 0.30(c) and 
0.85(d), the projection of {G l } to the second, third and fourth eigenvectors of H are also plotted. 
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FIG. 8: (Color online) The g% curve, effect of t and illustration of statistics improvement. The behavior of Sx eq with 
increasing number of basis functions are shown in (a) (TDQW system) and (b)(MHAT system), (c) The difference between 
the reconstructed equilibrium distribution and the theoretical distribution are shown for MHAT potential at T = 0.85. The 
horizontal axis denotes the method for reproducing the equilibrium distribution. The numbers mean the selected t* = t/r 
values in WED analysis (expressed in percentage ratio). Eq means the results without re-weighting {i.e., Wi = 1). SI1 and SI2 
label the two methods for statistics improvement. The trajectory weights calculated with t* equal to 0.01 are plotted as inset. 
In figure (a), (b) and (c), the statistical errors are also shown (see the main text for more details), (d) The first 40 eigenvalues 
of H for MHAT potential at T = 0.05. The ones calculated with normal data set (stars), multiple trajectory data set (one-fifth 
initial conformations and five trajectories for each conformation) and normal method (circles), multiple trajectory data set and 
SI2 method (squares) are shown. 



